Combination Approaches in Information Retrieval: Words vs. N-grams and Query Translation vs. Document Translation
نویسندگان
چکیده
This paper reports our proposal and experimental results at the NTCIR-4 CLIR task. For monolingual information retrieval, we use a combination strategy that integrates words and n-grams at the ranked list level. In combining words and n-grams, we concentrate on generating several ranked lists showing different retrieval characteristics on word and n-gram indexes by incorporating feedback schemes. For cross-language information retrieval, we attempt a dictionary-based bi-directional combination of query translation and document translation. For both query translation and document translation, their naïve translation is used. Experimental evaluations on CJK monolingual and KC/KJ cross-language retrieval give promising results on our combination approaches: words vs. ngrams, and query translation vs. document translation.
منابع مشابه
Knowledge-light Asian Language Text Retrieval at the NTCIR-3 Workshop
To combat the inherent complexity of text retrieval in a large number of disparate languages, scalable techniques must be developed and refined. We have been studying how well language-neutral approaches to text processing and retrieval can perform. With that goal, we participated in the third NTCIR workshop and conducted experiments using knowledge-light approaches, ones that did not attempt t...
متن کاملETH TREC-6: Routing, Chinese, Cross-Language and Spoken Document Retrieval
ETH Zurich's participation in TREC-6 consists of experiments in the main routing task, both manual and automatic runs in the Chinese retrieval track, cross-language retrieval in each of German, French and En-glish as part of the new cross-language retrieval track, and experiments in speech recognition and retrieval under the new spoken document retrieval track. This year our routing experiments...
متن کاملExploring New Languages with HAIRCUT at CLEF 2005
JHU/APL has long espoused the use of language-neutral methods for cross-language information retrieval. This year we participated in the ad hoc cross-language track and submitted both monolingual and bilingual runs. We undertook our first investigations in the Bulgarian and Hungarian languages. In our bilingual experiments we used several nontraditional CLEF query languages such as Greek, Hunga...
متن کاملDictionary-independent translation in CLIR between closely related languages
This paper presents results from a study, where fuzzy string matching techniques were used as the sole query translation technique in Cross Language Information Retrieval (CLIR) between the closely related languages Swedish and Norwegian. It is a novel research idea to apply only fuzzy string matching techniques in query translation. Closely related languages share a number of words that are cr...
متن کاملScalable Multilingual Information Access
The third Cross-Language Evaluation Forum workshop (CLEF-2002) provides the unprecedented opportunity to evaluate retrieval in eight different languages using a uniform set of topics and assessment methodology. This year the Johns Hopkins University Applied Physics Laboratory participated in the monolingual, bilingual, and multilingual retrieval tasks. We contend that information access in a pl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Int. J. Comput. Proc. Oriental Lang.
دوره 19 شماره
صفحات -
تاریخ انتشار 2004